This document describes the replication code for the signal-noise decomposition and noise share estimation analysis of the SOFIA (Secured Overnight Funding Interbank Average) reference rate. The analysis uses state-space (local level) models estimated via maximum likelihood to (i) decompose each SOFIA variant into a latent efficient rate and a transitory noise component, and (ii) estimate relative noise shares across pairs of alternative SOFIA constructions.
James Brugler, Calebe De Roure, Marta Khomyn, Max Prakoso, Talis Putnins
The replication package consists of two Jupyter notebooks (which execute the analysis) and three Python modules (which contain model definitions and helper functions). The table below summarises each file and its role.
| File | Type | Purpose |
|---|---|---|
sofia_signal_noise_extract_final.ipynb |
Notebook | Estimates the univariate local level model for each SOFIA variant individually, extracting the daily smoothed state (efficient rate) and noise (measurement disturbance). |
sofia_vwap_rba_vs_vwap_alt_noise_shares_final.ipynb |
Notebook | Estimates the bivariate local level model for each (base, alternative) SOFIA pair, computing relative noise shares and Wald tests for equal noise variances. |
stsp_ll_mods.py |
Module | Defines and estimates the univariate local level state-space model. Called by the signal-noise extraction notebook. |
stsp_mods.py |
Module | Defines and estimates the bivariate (contemporaneous) local level state-space model. Called by the noise shares notebook. |
noiseshares.py |
Module | Computes noise share ratios from the estimated bivariate model parameters. Called by the noise shares notebook. |
The two notebooks are independent of each other and can be run in either order. Each notebook calls one or more of the Python modules. The dependency graph is:
sofia_signal_noise_extract_final.ipynb
??? stsp_ll_mods.py
??? stspestmp_ll() [univariate local level estimation]
sofia_vwap_rba_vs_vwap_alt_noise_shares_final.ipynb
??? stsp_mods.py
? ??? stspestmp_contemp() [bivariate local level estimation]
??? noiseshares.py
??? noiseshares_contemp() [noise share computation]
All three .py modules must be located in the same working directory as the notebooks (or on the Python path).
Both analyses are built on the local level (random walk plus noise) state-space framework, estimated via maximum likelihood using the Kalman filter as implemented in statsmodels.tsa.statespace.MLEModel.
File: stsp_ll_mods.py → called by sofia_signal_noise_extract_final.ipynb
For each individual reference rate $y_t$, the model is:
$$y_t = \mu_t + \beta \cdot ONR_t + \varepsilon_t, \qquad \varepsilon_t \sim N(0,\, \sigma^2_{\varepsilon})$$$$\mu_t = \mu_{t-1} + \eta_t, \qquad \eta_t \sim N(0,\, \sigma^2_{\eta})$$where:
The Kalman smoother provides $E[\mu_t \mid y_1, \ldots, y_T]$ (the smoothed state) and $E[\varepsilon_t \mid y_1, \ldots, y_T]$ (the smoothed measurement disturbance) for each day $t$, enabling a full signal-noise decomposition of the observed rate.
Parameters estimated: $\sigma^2_{\eta}$ (state innovation variance), $\sigma^2_{\varepsilon}$ (noise variance), and $\beta$ (ONR coefficient, if applicable).
File: stsp_mods.py → called by sofia_vwap_rba_vs_vwap_alt_noise_shares_final.ipynb
For each pair of simultaneously observed rates $(y_{1,t},\, y_{2,t})$, the model is:
$$y_{1,t} = \mu_t + d + \beta_1 \cdot ONR_t + \varepsilon_{1,t}, \qquad \varepsilon_{1,t} \sim N(0,\, \sigma^2_1)$$$$y_{2,t} = \mu_t + \beta_2 \cdot ONR_t + \varepsilon_{2,t}, \qquad \varepsilon_{2,t} \sim N(0,\, \sigma^2_2)$$$$\mu_t = \mu_{t-1} + \eta_t, \qquad \eta_t \sim N(0,\, \sigma^2_{\eta})$$where:
Noise shares are computed as:
$$NS_j = \frac{\sigma^2_j}{\sigma^2_1 + \sigma^2_2}$$A noise share greater than 0.5 indicates that rate $j$ contributes more than half of the total pricing noise. A Wald test for $H_0: \sigma^2_1 = \sigma^2_2$ assesses whether the difference is statistically significant.
Parameters estimated: $\sigma^2_{\eta}$, $\sigma^2_1$, $\sigma^2_2$, $d$, and $\beta_1, \beta_2$ (if applicable).
Both models are estimated via a two-stage L-BFGS-B optimisation procedure (implemented in the optim_details() function within each module):
Variance parameters are constrained to be strictly positive via a squaring transformation with a floor of $10^{-7}$:
$$\sigma^2 = \tilde{\sigma}^2 + 10^{-7}$$where $\tilde{\sigma}$ is the unconstrained parameter value from the optimiser. This ensures numerical stability in the Kalman filter.
All models use diffuse initialisation for the state variable, reflecting the assumption that the initial level of the efficient rate is unknown a priori.
Estimation across multiple reference rates (or rate pairs) is parallelised using Python's multiprocessing.Pool.map(). Each estimation function is designed to accept a single list argument to conform to the map() interface. The number of worker processes is set to cpu_count() - 4, leaving cores available for other processes.
All input files are expected in a Data/ subdirectory relative to the notebook working directory.
| File | Source | Contents |
|---|---|---|
f01d.xlsx |
RBA Statistical Table F1 | Cash rate target (ONR) and actual overnight rate (AONIA). First 10 rows are metadata headers. |
sofia-beta-version_april_2024_corected.xlsx |
ASX | ASX-published SOFIA beta rates (VWAP and volume-weighted median). First 4 rows are headers; last 10 rows are footnotes. |
allSOFIAslonger.xlsx |
RBA | Synthetic SOFIA variants computed under different transaction filtering rules (outlier thresholds, minimum counterparty counts). |
sofia_replicate.xlsx |
RBA | RBA-replicated base SOFIA rates, used for verification against ASX-published values. |
SOFIARelParty.xlsx |
RBA | SOFIA rates computed after excluding related-party transactions, used for verification. |
The estimation sample runs from 2022-01-04 to 2025-03-19. There are four dates within this window where SOFIA is not computed due to insufficient transaction volume. These missing values are handled via a configurable fill method (default: substitute from RBA base SOFIA).
The alternative SOFIA constructions (used as ref2 in the noise share analysis) differ in their transaction filtering rules:
| Variable name | Description |
|---|---|
sofia_vwap_0525 |
5% lower / 25% upper outlier thresholds |
sofia_vwap_0525_2mn |
Same thresholds, plus minimum 2 counterparties |
sofia_vwap_2525 |
25% lower / 25% upper outlier thresholds |
sofia_vwap_2525_2mn |
Same thresholds, plus minimum 2 counterparties |
sofia_vwap_relparty |
Excluding related-party transactions |
Notebook: sofia_signal_noise_extract_final.ipynb
Load and merge data. Read the RBA cash rate data (f01d.xlsx), ASX-published SOFIA (sofia-beta-version_april_2024_corected.xlsx), and RBA synthetic SOFIA variants (allSOFIAslonger.xlsx). Merge all sources into a single DataFrame on date. Verify consistency of replicated rates via scatter plots.
Prepare estimation inputs. For each reference rate to be estimated (sofia_vwap_asx, sofia_vwap_rba), construct a $T \times 1$ observation vector (optionally rounded to 2 decimal places) and a $T$-vector of overnight rates for the exogenous control.
Estimate in parallel. Pass each rate's input list to stspestmp_ll() from stsp_ll_mods.py via multiprocessing.Pool.map(). The function selects the appropriate model class (with or without ONR control) based on whether the overnight rate varies in the sample, then runs the two-stage MLE.
Extract and save results. The Kalman smoother provides the daily smoothed state $E[\mu_t \mid \mathbf{y}]$ and measurement disturbance $E[\varepsilon_t \mid \mathbf{y}]$. These are plotted and saved to CSV files in Outputs/.
Notebook: sofia_vwap_rba_vs_vwap_alt_noise_shares_final.ipynb
Load and merge data. Same data loading and merging steps as Analysis 1, with the addition of the related-party SOFIA file for verification.
Handle missing values. Identify dates where any estimation variable is missing. Apply a configurable fill method (default: substitute from sofia_vwap_rba) to ensure complete observation vectors for the Kalman filter.
Plot rate spreads. For each (base, alternative) pair, plot both rates less the cash rate target over time to visualise differences in benchmark construction.
Prepare estimation inputs. For each alternative rate, construct a $T \times 2$ observation matrix (column 0 = base rate, column 1 = alternative) and the overnight rate control vector.
Estimate in parallel. Pass each pair's input list to stspestmp_contemp() from stsp_mods.py via multiprocessing.Pool.map(). The function estimates the bivariate local level model and returns parameters, smoothed disturbances, and a Wald test for equal noise variances.
Compute noise shares. Call noiseshares_contemp() from noiseshares.py to compute $NS_j = \sigma^2_j / (\sigma^2_1 + \sigma^2_2)$ for each pair.
Report results. Display noise shares, Wald test p-values, and full parameter estimates for each alternative rate.
sofia_signal_noise_extract_final.ipynb¶Outputs/sign_noise_{rate}_apr2025.csv: Daily signal-noise decomposition for each rate, containing columns for the date, observed rate, smoothed state, state innovation, measurement disturbance, overnight rate, and fitted value.sofia_vwap_rba_vs_vwap_alt_noise_shares_final.ipynb¶statsmodels estimation summaries for each bivariate model.The code was developed and tested with the following packages. The notebooks print exact version numbers at runtime for reproducibility.
| Package | Role |
|---|---|
| Python 3.x | Runtime |
| pandas | Data manipulation and merging |
| numpy | Array operations and rounding |
| statsmodels | State-space model estimation (Kalman filter/smoother, MLE) |
| matplotlib | Plotting |
| multiprocessing | Parallel estimation across rates/pairs |
| openpyxl | Reading .xlsx input files (pandas backend) |
.py modules (stsp_ll_mods.py, stsp_mods.py, noiseshares.py) in the same directory as the notebooks.Data/ subdirectory containing the input files listed above.Outputs/ subdirectory for CSV output.Note on parallelisation: The notebooks use multiprocessing.Pool with cpu_count() - 4 worker processes. On machines with fewer than 5 cores, adjust mp_cores to at least 1.